280 Birds with One Stone: Inducing Multilingual Taxonomies from Wikipedia using Character-level Classification

نویسندگان

  • Amit Gupta
  • Rémi Lebret
  • Hamza Harkous
  • Karl Aberer
چکیده

We propose a simple, yet effective, approach towards inducing multilingual taxonomies from Wikipedia. Given an English taxonomy, our approach leverages the interlanguage links of Wikipedia followed by character-level classifiers to induce high-precision, high-coverage taxonomies in other languages. Through experiments, we demonstrate that our approach significantly outperforms the stateof-the-art, heuristics-heavy approaches for six languages. As a consequence of our work, we release presumably the largest and the most accurate multilingual taxonomic resource spanning over 280 languages.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Classifying Wikipedia Articles into NE's Using SVM's with Threshold Adjustment

In this paper, a method is presented to recognize multilingual Wikipedia named entity articles. This method classifies multilingual Wikipedia articles using a variety of structured and unstructured features and is aided by cross-language links and features in Wikipedia. Adding multilingual features helps boost classification accuracy and is shown to effectively classify multilingual pages in a ...

متن کامل

Need to categorize: A comparative look at the categories of the Universal Decimal Classification system (UDC) and Wikipedia

This study analyzes the differences between the category structure of the Universal Decimal Classification (UDC) system (which is one of the widely used library classification systems in Europe) and Wikipedia. In particular, we compare the emerging structure of category-links to the structure of classes in the UDC. With this comparison we would like to scrutinize the question of how do knowledg...

متن کامل

Rapid Induction of Multiple Taxonomies for Enhanced Faceted Text Browsing

In this paper we present and compare two methodologies for rapidly inducing multiple subject-specific taxonomies from crawled data. The first method involves a sentence-level words co-occurrence frequency method for building the taxonomy, while the second involves the bootstrapping of a Word2Vec based algorithm with a directed crawler. We exploit the multilingual open-content directory of the W...

متن کامل

Document Categorization using Multilingual Associative Networks based on Wikipedia

Associative networks are a connectionist language model with the ability to categorize large sets of documents. In this research we combine monolingual associative networks based on Wikipedia to create a larger, multilingual associative network, using the cross-lingual connections between Wikipedia articles. We prove that such multilingual associative networks perform better than monolingual as...

متن کامل

CEA LIST's Participation at the CLEF CHiC 2013

For our first participation to the CLEF CHiC Lab, we submitted runs to the multilingual ad-hoc and multilingual semantic enrichment tasks. Given the strong multilingual character of the evaluation corpus, the main objectives of the experiments were to test the efficiency of semantic topic expansion and consolidation based on Explicit Semantic Analysis (ESA) versions in different languages. Anot...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1704.07624  شماره 

صفحات  -

تاریخ انتشار 2017